home *** CD-ROM | disk | FTP | other *** search
- CH 8 − USING CSV FILES
- ======================
- 8.1 What are CSV files?
- ========================
- CSV stands for “comma-separated values” and is the name given to data files
- in which each item of data is separated from the next by a comma. Such files
- are widely used to transfer data from one application to another, e.g. from
- a database to a spreadsheet or from a RISC OS database to a PC database. The
- following points should be noted:−
-
- (1) Data items which are non-numeric (e.g. plain text items such as
- names and addresses) are often enclosed in double quotes (“”)
- whereas numeric items are not. This makes it possible for an
- application reading a CSV file to distinguish between numbers and
- strings (which might of course contain numerals) and also allows a
- comma to be used as a character within a string without being
- mistaken for a data separator. (Addresses often contain commas e.g.
- 112, Keighley Road). For many purposes the quotes aren’t necessary
- and may be omitted.
-
- (2) Null data items are usually included and can be located by looking
- for two commas with either nothing in between them or with only two
- quotation marks between them, i.e. ,, or ,“”,. Each line of data
- (record) in such a file will always contain the same number of data
- items (fields). Powerbase does, however, allow nulls to be omitted
- entirely when creating a file in which case the number of fields per
- record will vary.
-
- (3) The way in which each record is terminated varies from one system to
- another. The last item of data in a record is not followed by a
- comma but by a line terminator. On RISC OS systems this is the
- line-feed character (ASCII value 10) and on PCs it is usually the
- carriage-return character (ASCII value 13). You might, however,
- encounter CSV files in which both these characters are used, i.e. LF
- CR or CR LF.
-
- (4) Separator other than commas are sometimes used. The Tab character
- (ASCII value 9) is often used and such files are called TSV files.
- Powerbase allows you to separate data items and terminate lines (see
- (3) above) with any character or character pair.
-
- (5) Some applications which accept a CSV file as input expect the first
- line to contain the names of the fields which comprise the
- subsequent records, e.g. if each record consists of a name and a
- four-part address this header record might read:−
-
- “NAME”,“STREET”,“TOWN”,“COUNTY”,“POSTCODE”
-
- Powerbase is able to save data in the form of true CSV files or files using
- another data separator. If the separator is a comma then the file created is
- of type &dfe and its icon displays the letters CSV. If another separator,
- eg. Tab, is used (see (4) above) then the file is of the ordinary Text type,
- i.e. &fff. Files of these types may also be used to enter data into a
- Powerbase database. In what follows we will, for convenience, refer to them
- all as “CSV files” whether or not the separator is a comma. The choices on
- the main menu which control these actions are Export CSV and CSV options.
-
- 8.2 Setting the CSV options
- ============================
- The main-menu choice CSV options leads to a dialogue box which lets you
- specify all the file characteristics described earlier. Selection of the
- first three option buttons causes a saved file to have, respectively:−
-
- • quotes round non-numeric fields (see (1) above)
- • a header record specifying the field names (see (5) above).
- • null fields included (see (2) above).
-
- Note that the field names referred to in (b) are, by default, the tags of
- the corresponding Powerbase fields, but may be changed to the descriptors by
- altering the setting in the Print options window. The field separator and
- record terminator may be changed by means of menus which appear when you
- click on the pop-up menu icons. Each of these menus includes a writable
- entry allowing you to define your own separator and/or terminator of one or
- two printing characters.
-
- 8.3 Exporting data as a CSV file
- =================================
- Having set up your options as described above, creating a CSV file is very
- like printing a list. First highlight the fields to be exported by clicking
- on each with ADJUST. Remember that the order in which the fields are
- highlighted is important. Next choose Export CSV from the main menu. A
- window appears featuring the Query panel. Type in the search formula to
- determine which records are exported. Finally, enter the name of the file
- and drag the file icon to a filer window. By default the file is saved in
- PrintJobs as usual, and you may simply click on Export or type Return.
- Experiment with saving CSV files with different settings of the CSV options
- and then loading the resulting files into Edit to examine them. If you
- select the Reverse switch on the Query panel the CSV file will be created in
- reverse order. (See also 3.3.5)
-
- 8.4 Using CSV files to import data
- ===================================
- If you drag a CSV file onto the record display of an open database the
- options window (see CSV options) appears with a changed title and some
- additional icons, one of which displays the pathname of the CSV file.
- Clicking on Import will make Powerbase try to create new database records
- from the file. If you decide not to do this you should click on Cancel. If
- you do wish to import the data there are some important consideration which
- will now be explained.
-
- 8.4.1 Ensuring that the correct options are selected
- ----------------------------------------------------
- If the CSV file originally came from a Powerbase application the settings in
- the CSV options window should be exactly the same as they were when the file
- was exported. The exception to this is the Quotes button which is shaded on
- import because Powerbase doesn’t need it. If the CSV file came from a PC or
- another RISC OS application you might have to load it into Edit to find out
- what separator and terminator are used.
-
- There are three more option switches at the bottom of the window. One causes
- each record to be displayed as it is imported. The import process is slower
- with this turned on but much more informative if you like to know how things
- are progressing. The second switch, when selected, strips any trailing
- spaces in the imported data-fields. Some database programs pad all fields to
- their maximum length by adding spaces to the end of the data where
- necessary. If you import such a file into Powerbase you will find that the
- caret will always be at the far right of the field even though visible
- characters do not fill the field, and some queries won’t work properly.
- Setting the Strip spaces switch before importing the file overcomes the
- problem. The third option switch determines how Sequence number fields are
- handled. If the switch is ON imported sequence numbers are ignored and new
- ones assigned in accordance with the field’s sequence number counter. With
- the switch OFF sequence numbers from the CSV file are imported without
- alteration.
-
- 8.4.2 Directing imported data to the correct fields
- ---------------------------------------------------
- If no fields on the record screen are highlighted (i.e. with ADJUST) and the
- CSV file does not contain a header record (see 8.1) then the import process
- proceeds according to the following rules:−
-
- • The first CSV field will be read into the first Powerbase field for
- which importing is allowed (Graphics fields, Buttons or fields which
- are merely labels will be ignored). The next CSV field will be read
- into the second Powerbase field and so on.
-
- • If the end of the CSV record is reached before all the relevant
- fields have been filled (data underflow) then the next CSV record
- will start a new Powerbase record, i.e. the reading won’t get out of
- step. It does not matter, therefore, if the CSV file omits null
- items at the end of a line.
-
- • If all relevant fields are filled before reaching the end of the CSV
- record (data overflow) Powerbase ignores the remainder of the line
- and skips to the beginning of the next CSV record before starting a
- new Powerbase record. This is also to keep the operation in step.
-
- There will be occasions when you don’t want to fill the Powerbase fields
- sequentially as just described. There are two ways of making the process
- more specific:−
-
- • Highlight the required Powerbase fields with ADJUST before starting
- the import. Data will then be read only into the highlighted
- fields, all other fields being ignored. The order in which the
- fields are filled is the order in which you highlighted them. The
- rules given above about underflow and overflow of data still apply.
-
- • Give the file a header record containing the tags or descriptors of
- required fields in the Powerbase record. (There is nothing to
- prevent you manually adding such a header to a CSV file which did
- not originate from a Powerbase application.) Importing then occurs
- just as if those fields were highlighted. The Print options window
- must reflect whether the tags or descriptors of fields are used.
-
- Do not use both a header and highlighting.
-
- 8.4.3 Importing plain text files
- --------------------------------
- It was previously explained that files created with Export CSV can have
- separators other than a comma. Such files will be of type &fff (plain text)
- instead of &dfe (CSV). They can still be imported into a Powerbase database
- but a certain amount of caution is needed because there are other
- circumstances in which a text file might be dropped on the record window. A
- properly-written script file (see Ch 9) would be recognised as such and
- therefore cause no problem, but any text file dropped onto the appropriate
- type of External field (i.e. a Text or Text Block) field would become linked
- to that field instead of being treated like a CSV file. If you are importing
- data from a plain text file and your record contains fields of the
- aforementioned types be sure to drop the file on the window background, not
- on the External field. You are strongly advised to use proper CSV files if
- at all possible.
-
- 8.4.4 What if the imported data won’t fit?
- ------------------------------------------
- There are two situations in which this can happen. The database might not
- contain enough free records to hold all the imported data and so you get a
- “Database full when reading CSV file” error. To avoid this either make sure
- the database is big enough before you start or place a suitable value in the
- Increment for expansion icon in the Change length window. The latter is
- accessible from the Utilities submenu of the icon-bar menu.
-
- The second situation is where an item is too long for the destined database
- field. When importing data Powerbase maintains a file called TooBig inside
- the database’s PrintJobs directory. Anything which won’t fit in the target
- field is written to this file together with information about where it was
- intended to go. No writable Powerbase field may be longer than 246
- characters and if an item of imported data exceeds this a note will be made
- in the TooBig file advising you to define an External field (Text Block or
- Text) for such data.
-
- 8.5 Using CSV files to modify existing records
- ==============================================
- Most database programs which support CSV import allow it to be used only for
- creating new records. Powerbase is unusual in that you can use a CSV file
- to modify existing records. This capability should be used with caution
- since careless use can irrevocably garble a database. There are three
- relevant radio buttons in the CSV options window. They are called Modify
- existing, With primary key and With rec. number and their actions are as
- follows:
-
- Modify existing
- ---------------
- No new records will be created when a CSV file is dropped onto the record
- window. The existing records will be accessed in the order determined by
- the current index and the new data will be merged into these records. You
- should, of course, either use ADJUST to highlight the fields into which the
- data should go or place a header in the CSV file specifying the field tags
- and then turn on the With header switch (see 8.4.2). If all the records are
- modified before the end of the CSV file has been reached a warning message
- is displayed. Note that it is the user’s responsibility to ensure that the
- data in the CSV file is in the correct order since Powerbase has no way of
- telling which data is destined for which record and can only proceed
- sequentially. As an additional precaution a warning message is displayed
- when you choose this option.
-
- With primary key
- ----------------
- This affects both export and import. When you export data with this button
- selected each record of the CSV file includes the primary key of the
- Powerbase record. (Try creating a CSV file with and without this option set
- and compare the files using Edit.) When importing such a file Powerbase
- will attempt to locate records with the same primary keys as the records in
- the CSV file. If a matching key is not found a new record will be created to
- receive the data. If it is found then the CSV data goes into the same
- record, overwriting any data which the target fields already contain. The
- option is only useful for transferring data between Powerbase databases
- which have the same primary key field. Don’t forget that Powerbase, by
- default, allows duplicate primary keys so it is possible for the new data
- going into the wrong record. You would be ill-advised to trust this method
- of import unless you are sure each primary key is unique.
-
- With record number
- ------------------
- This resembles the previous option. It allows data to be exported with the
- Powerbase record numbers included. Importing into another (or the same)
- database places the data in records having the same record numbers, again
- overwriting data which may already be present in the target fields. This
- option is only useful for transferring data between Powerbase databases in
- which corresponding records have identical record numbers. One use for it
- might be to export a set of data, load it into an editor and do some
- extensive searching-and-replacing, then put the modified data back into the
- original database.
-
- 8.6 Creating a new, working database from a CSV file
- ====================================================
- There is an option button in the CSV options window called With field data
- which only becomes active when the With header button is selected. With the
- switch selected the header record of a saved CSV file contains not only the
- field names but also the field lengths and types. Again it is suggested that
- you create a file and look at it in Edit. Each item in the header begins
- with a number. This is the field length in characters, i.e. the maximum
- allowed length in the Powerbase record field. Next comes a ¤ character which
- separates the length from the field name. Another ¤ separates the field name
- from the concluding number which determines the field type. (The type
- numbers may be inspected by looking at the ValStrings file in the Powerbase
- directory.)
-
- A file created in this form is not meant to be dropped onto the window of an
- open database. It should be dropped onto the Powerbase icon on the icon-bar
- when no database is open. Powerbase will attempt to convert the file into a
- functioning database. All the fields will be ranged on the left of the
- record window, one beneath another and the primary key will consist of the
- first four characters of the first field. Don’t try to create databases
- containing Check box, Button or External fields using this method; it’s only
- meant for the most basic type of database.
-
- 8.7 CSV files and validation tables
- ====================================
- You can save the contents of a validation table as a CSV file (see 5.6).
- You can also import data into a validation table by dropping a CSV file onto
- it, the procedure being just the same as that described for importing into
- database records.